Policy Optimization with Stochastic Mirror Descent
نویسندگان
چکیده
Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes VRMPO algorithm: efficient policy gradient method with stochastic mirror descent. In VRMPO, novel variance-reduced estimator is presented to improve efficiency. We prove that the proposed needs only O(ε−3) trajectories achieve an ε-approximate first-order stationary point, which matches best complexity for optimization. Extensive empirical results demonstrate VRMP outperforms state-of-the-art methods various settings.
منابع مشابه
Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization
In this paper, we present a new stochastic algorithm, namely the stochastic block mirror descent (SBMD) method for solving large-scale nonsmooth and stochastic optimization problems. The basic idea of this algorithm is to incorporate the block-coordinate decomposition and an incremental block averaging scheme into the classic (stochastic) mirror-descent method, in order to significantly reduce ...
متن کاملStochastic Mirror Descent in Variationally Coherent Optimization Problems
In this paper, we examine a class of non-convex stochastic optimization problems which we call variationally coherent, and which properly includes pseudo-/quasiconvex and star-convex optimization problems. To solve such problems, we focus on the widely used stochastic mirror descent (SMD) family of algorithms (which contains stochastic gradient descent as a special case), and we show that the l...
متن کاملGuided Policy Search as Approximate Mirror Descent
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...
متن کاملGuided Policy Search via Approximate Mirror Descent
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...
متن کاملActive Model Aggregation via Stochastic Mirror Descent
We consider the problem of learning convex aggregation of models, that is as good as the best convex aggregation, for the binary classification problem. Working in the stream based active learning setting, where the active learner has to make a decision on-the-fly, if it wants to query for the label of the point currently seen in the stream, we propose a stochastic-mirror descent algorithm, cal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i8.20863